Accurate Modeling and Generation of Storage I/O for Datacenter Workloads
نویسندگان
چکیده
Tools that confidently recreate I/O workloads have become a critical requirement in designing efficient storage systems for datacenters (DCs), since potential inefficiencies get aggregated over several thousand servers. Designing performance, power and cost optimized systems requires a deep understanding of target workloads, and mechanisms to effectively model different design choices. Traditional benchmarking is invalid in cloud data-stores, representative storage profiles are hard to obtain, while replaying the entire application in all storage configurations is impractical. Despite these issues, current workload generators are not comprehensive enough to accurately reproduce key aspects of real application patterns. Some of these features include spatial and temporal locality, as well as tuning the intensity of the workload to emulate different storage system behaviors. To address these limitations, we use a state diagram-based storage model, extend it to a hierarchical representation and implement a tool that consistently recreates I/O loads of DC applications. We present the design of the tool and the validation process performed against six original DC applications traces. We explore the practical applications of this methodology in two important storage challenges 1) SSD caching and 2) defragmentation benefits on enterprise storage. In both cases we observe significant storage speedup for most of the DC applications. Since knowledge of the workload’s spatial locality is necessary to model these use cases, our tool was instrumental in quantifying their performance benefits.
منابع مشابه
Time and Cost-Efficient Modeling and Generation of Large-Scale TPCC/TPCE/TPCH Workloads
Large-scale TPC workloads are critical for the evaluation of datacenter-scale storage systems. However, these workloads have not been previously characterized, in-depth, and modeled in a DC environment. In this work, we categorize the TPC workloads into storage threads that have unique features and characterize the storage activity of TPCC, TPCE and TPCH based on I/O traces from real server ins...
متن کاملSynthesizing Representative I/O Workloads Using Iterative Distillation
Storage systems designers are still searching for better methods of obtaining representative I/O workloads to drive studies of I/O systems. Traces of production workloads are very accurate, but inflexible and difficult to obtain. (Privacy and performance concerns discourage most system administrators from collecting such traces and making them available to the public.) The use of synthetic work...
متن کاملOn Modeling the Relative Fitness of Storage (CMU-PDL-07-108)
Storage management is usually handled by skilled system administrators. The specific task of configuring and allocating disk space for applications, often referred to as storage system design, is especially timeconsuming and error-prone. Automated storage system design, a solution proposed by many, relies on fast and accurate performance predictions. However, challenges with conventional perfor...
متن کاملOn modeling the relative fitness of storage
Storage management is usually handled by skilled system administrators. The specific task of configuring and allocating disk space for applications, often referred to as storage system design, is especially timeconsuming and error-prone. Automated storage system design, a solution proposed by many, relies on fast and accurate performance predictions. However, challenges with conventional perfor...
متن کاملCCM: Scalable, On-Demand Compute Capacity Management for Cloud Datacenters
We present CCM (Cloud Capacity Manager) – a prototype system, and, methods for dynamically multiplexing the compute capacity of cloud datacenters at scales of thousands of machines, for diverse workloads with variable demands. This enables mitigation of resource consumption hotspots and handling unanticipated demand surges, leading to improved resource availability for applications and better d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011